DiffCircaPipeline: a framework for multifaceted characterization of differential rhythmicity

Abstract Summary Circadian oscillations of gene expression regulate daily physiological processes, and their disruption is linked to many diseases. Circadian rhythms can be disrupted in a variety of ways, including differential phase, amplitude and rhythm fitness. Although many differential circadian biomarker detection methods have been proposed, a workflow for systematic detection of multifaceted differential circadian characteristics with accurate false positive control is not currently available. We propose a comprehensive and interactive pipeline to capture the multifaceted characteristics of differentially rhythmic biomarkers. Analysis outputs are accompanied by informative visualization and interactive exploration. The workflow is demonstrated in multiple case studies and is extensible to general omics applications. Availability and implementation R package, Shiny app and source code are available in GitHub (https://github.com/DiffCircaPipeline) and Zenodo (https://doi.org/10.5281/zenodo.7507989) Supplementary information Supplementary data are available at Bioinformatics online.


The study design and notations
DiffCircaPipeline is designed for studies that compare two contrasting groups for differential rhythmicity, denoted as group I (reference) and II (comparison). The analyses require the input of biological measurements from samples and corresponding sample collection times. The time of day is converted to Zeitgeber time (ZT) based on the onset of light (Figure 1), where ZT0 corresponds to sunrise. On average, ZT0 is often referred to 6 AM but the actual sunrise time depends on the date and location (latitude and longitude), for which we have provided a function DCP getZT for convenient conversion. In this diagram, we assume that sunrise time (the start of the light cycle) is 6 AM and sunset time (the start of the dark cycle) is 18 PM. In reality, these time points change with location and day in a year.
We denote the input biological data as Y 1 ∈ R G×N1 and Y 2 ∈ R G×N2 , with the Zeitgeber time of the samples t 1 ∈ R N1 and t 2 ∈ R N2 . Here N 1 and N 2 are sample sizes of group I and group II, respectively, and G is the number of biological features (e.g., genes or methylation features). The cosinor model: characterizes the gene rhythmicity with four parameters: amplitude (A), phase (ϕ), Midline Estimating Statistic Of Rhythm (MESOR, M ) and the noise level (σ). The peak time equals phase under this formula, so the two terms will be used interchangeably. The constant ω = 2π P is pre-specified, and P = 24 hours is used for circadian rhythmicity although different P could also be used (e.g., 12-hour rhythmicity). Specifically, the signal to noise ratio SNR = A/σ describes the strength of rhythm, which is equivalent to the goodness of fit statistic R 2 under mild conditions (see Proposition 2 in Section 2.3). The subscript of parameters indicates the group number, i.e. A 1 is the amplitude of group I and A 2 is for group II. Parameters without the subscript are shared by both groups, i.e. A = A 1 = A 2 .

Recommended methods and alternatives
Before input to DiffCircaPipeline, data should be processed appropriately. Examples include filtering out low-expressed features, removing low-quality samples or outliers, normalization between samples, log transformation, etc. The processing procedure varies between data platforms and thus is not included in DiffCir-caPipeline.

Categorization of types of rhythmicity
In step (a) ( Figure 1A(a) in the main text), we categorize biomarkers into four types of rhythmicity (TOR): arrhythmic in both groups (Arrhy), rhythmic in only group I (RhyI), rhythmic in only group II (RhyII), and rhythmic in both groups (RhyBoth). This step is essential because the differential rhythm fitness test is only meaningful if a biomarker is rhythmic in at least one group, while the differential rhythm parameter test is only meaningful with biomarkers rhythmic in both groups.
Denoting the rhythmicity p-values from the two groups as p 1 and p 2 , DiffCircaPipeline provides three methods for this purpose (more available methods and detailed descriptions can be found in Section 2.2): • (Default) SSMS with Sidak.FS (Selective sequential model selection with Sidak adjustment and For-wardStop). This method sequentially selects the rhythmic groups for each biomarker by applying a pre-specified cutoff to p 1 and p 2 (details in Section 2.2).
• VDA (Venn diagram analysis). This method compares p 1 and p 2 with a given cutoff separately to get lists of biomarkers that are rhythmic in each group. Then the overlap of the two biomarker lists are considered RhyBoth.
• AW (Adaptively weighted Fisher's method). This method calculates a combined p-value from p 1 and p 2 indicating the existence of rhythm in at least one of the groups. At the same time, weights (0 or 1) are estimated for the two groups to indicate TOR (i.e., the weight is (1, 1) for RhyBoth biomarkers and (1, 0) for RhyI biomarkers).
We describe these tests in detail in Section 2.3 and benchmark them in Section 3.2. The likelihood ratio test is recommended due to the most consistent type I error control over varying SNRs and under the existence of differential parameters. The computation speed of the likelihood ratio test is also the fastest compared to the two sampling-based methods.
In the literature, differential rhythmicity fitness has been tested by whether R 2 1 = R 2 2 (Ketchesin et al. (2021)) or by σ 2 1 = σ 2 2 (Ding et al. (2021)). We find that a difference in R 2 is more meaningful and more concordant with a difference in oscillation amplitude and rhythmicity p-value. However, a difference in σ 2 does not imply changes in other rhythmicity characteristics. As a result, only testing for R 2 is pursued in DiffCircaPipeline. Justification of the choice to test the differential R 2 rather than the differential σ 2 can be found in Section 4.1.3.

Differential rhythm parameter test
The differential rhythm parameter test is performed for only RhyBoth biomarkers to detect the difference in the three rhythm curve parameters (A 1 = A 2 , ϕ 1 = ϕ 2 or M 1 = M 2 ). If one is only interested in testing one of the parameters, the likelihood ratio test in diffCircadian package (Ding et al. (2021)) will be performed. However, if the change in more than one parameter is of interest, we perform the two-stage differential rhythmicity test with first-stage global test followed by second-stage post hoc individual parameter tests. Two global tests are available: (1) change in A and ϕ together, or (2) change in A, ϕ, and M .
Performing a two-stage differential rhythmicity test rather than multiple individual tests helps avoid the type I error inflation from multiplicity. Section 2.4 describes the tests and Section 3.3 validates the type I error control by simulations.

Estimation and testing of cosinor model
We recall the cosinor model Cornelissen (2014): where ϵ i ∼ N (0, σ 2 ) and ω = 2π/P is pre-defined. For circadian rhythmicity, we have P = 24 hours. Applying the trigonometric conversion cos (x + y) = cos x cos y − sin x sin y, Equation 1 transforms to with β 1 = A cos ωϕ and β 2 = A sin ωϕ. With cos (ωt i ) and sin (ωt i ) being constants transformed from t, β 1 and β 2 can be readily estimated by ordinary least square regression. Then we calculate the rhythm parameters by: where K is an integer.
To detect a rhythmic biomarker, we test the significance of rhythmicity with hypotheses H 0 : A = 0 and H 0 : where RSS is the residual sum of squares and TSS is the total sum of squares: With the critical level α, the null hypothesis is rejected when F > F 1−α,2,N −3 (F q,a,b is the qth percentile of F distribution with degrees of freedom a and b).

Categorization of types of rhythmicity
After fitting the expression data with the cosinor model for the two groups separately, we categorize the biomarkers into four types of rhythmicity (TOR): arrhythmic in both groups (Arrhy), rhythmic in only group I (RhyI), rhythmic in only group II (RhyII), and rhythmic in both groups (RhyBoth).
For this purpose, one commonly used method is Venn diagram analysis (VDA), which is inappropriate due to the lack of concern for multiplicity and arbitrary cutoff choice. It is also pointed out in Pelikan et al. (2021) that VDA tends to classify Arrhy and RhyBoth biomarkers to RhyI or RhyII. The authors then proposed two schemes in the package compareRhythm: hypothesis testing and model selection (described below) to classify the biomarkers into five categories (the corresponding category in DiffCircaPipeline are listed in the parenthesis and will be used thereafter): arrhythmic (Arrhy), gain of rhythmicity (RhyII), loss of rhythmicity (RhyI), same rhythmicity (RhyBoth), change of rhythmicity (RhyBoth). Note that "same rhythmicity" or "change of rhythmicity" from compareRhythm are concatenated to "RhyBoth" in our paper due to difference in definition of "change of rhythmicity".
In DiffCircaPipeline we propose to apply and generalize the selective sequential model selection (SSMS) G' Sell et al. (2016) procedure. To illustrate, we build a nested model path starting from the simplest model where the biomarker is not rhythmic in either groups (M D 0 ). The next model in the path has the biomarker rhythmic in only one group (M D 1 ), and the full model has the biomarker rhythmic in both groups (M D 2 ).Then we use stopping rules to select a parsimonious model on the model path. Figure 2 summarizes the construction of the model path according to the order of rhythmicity p-values p 1 and p 2 . The models in Figure 2 are: Here g 1i = 1 and g 2i are index functions for if sample i is from group I or group II, correspondingly. Here p 1 and p 2 denote rhythmicity p-values from group I and group II, respectively. p (1) and p (2) is the smaller and the larger one of p 1 and p 2 .
Rhythmicity p-values p 1 and p 2 are generated by testing M D 0 against M D I 1 in group I data and testing M D 0 against M D II 1 in group II data separately. We use p (1) = min(p 1 , p 2 ) and p (2) = max(p 1 , p 2 ) to denote the smaller and the larger one of them respectively, and denote p (1) and p (2) as single step p-values for step one and step two. Next, stopping rules are applied to select a parsimonious model. Denote k as the step that we stop at, e.g., when the true category is RhyBoth, the step two model M D 2 is true and k = 2. Given a critical value α, we consider the following two stopping rules: • BasicStop (BS) (Marcus et al. (1976)) selects the largest model with single-step p-value smaller than α:k B = max{k : p k < α} • ForwardStop (FS) (default): considers all the preceding single-step p-values: Note that using rhythmicity p-values as the single step p-values is anti-conservative because we keep selecting the smallest p-value, which is biased without a pre-determined path ). So the critical value of the stopping rules is adjusted for the sequential steps using Sidak (Šidák (1967)): α * = 1 − (1 − α) 1/d , where d = 2 is the number of single-step p-values. In Section 3.1, we will perform simulation and compare the performance of the selective sequential model selection method with ForwardStop (Sidak.FS) or BasicStop (Sidak.BS) with VDA, the two compareRhythm methods (hypothesis testing and model selection), and the AW-Fisher method used by Ketchesin et al. (2021): • Selective sequential model selection methods (Sidak.BS and Sidak.FS): The model path and single-step p-values are computed as described above. Then we tested BS and FS for model selection with a Sidak adjusted critical value.
• Venn Diagram Analysis (VDA): Compare p 1 and p 2 to α separately and use Venn Diagram to decide assignment of the four TOR categories.

Differential rhythm fitness test
For rhythmic biomarkers in at least one group, we ask if there is a difference in the data's rhythm fitness to the cosinor model across groups. The rhythm fitness is characterized by the goodness-of-fit statistics R 2 : In Chen et al. (2016) the difference in R 2 is tested using permutation test. In the next subsections, we propose a likelihood ratio test and a bootstrap test for the same purpose. All the methods are compared in section 3.2. The hypothesis for testing R 2 is:

Likelihood ratio test (LR)
It is difficult to derive the exact analytical form for differential R 2 test. However, with Proposition 1-2, we proved that difference in R 2 can be approximated by difference in the signal to noise ratio (SNR = A/σ). We then perform likelihood ratio test for equal SNR under the null hypothesis: Proposition 1.
Taking expectation for both sides with bounded R 2 (R 2 ≤ 1), we get .
Proposition 2. Under the condition that the sampled time points for the two groups are the same, testing R 2 is equivalent to testing A σ . Proof. Note that E(RSS) = σ 2 (n − r), where r = 3 is the number of parameters in equation 1. Denotê )) 2 is the same for both groups, i.e. the sampled time points are

Permutation test
The permutation test is performed by randomly shuffling the samples along with zeitgeber time between two groups for B times. The bth shuffling produces the pseudo data: 2 . Then we fit the pseudo data with cosinor model and calculate R 2 g for biomarker g (R 2 = 1 − RSS TSS ). The empirical null distribution for ∆R 2 is then ( The p-value for an observed ∆R 2 g is calculated as . Note that B should be a relatively large number and we use B = 1000 throughout the paper.

Bootstrap test
Another non-parametric test we propose is the bootstrap test. Similarly, we sample with replacement within the group for B times to produce the pseudo data and calculate the empirical null distribution of ∆R 2 g . The p-value for an observed ∆R 2 g is

Two-stage differential rhythm parameter test
For RhyBoth biomarkers, one might be interested in testing if any of the rhythmicity parameters (A, ϕ) or the MESOR are statistically different between the two groups. The DiffCircaPipeline performs a two-stage DR parameter test, which comprises a global test followed by post hoc individual parameter tests. We provide two commonly-used integrated hypotheses for the first-stage global tests: ), F test between the following null and alternative models are performed: where g 2i = 1 if sample i belongs to group II and g 2i = 0 if otherwise. In the null model only difference in MESOR between the two groups is allowed while in the alternative model the extra cosinor term accounts for the difference in both A and ϕ.
Similarly, the null and alternative models to test A, ϕ and M (Equation 4) are If the global F tests reject the null, we then perform the second-stage post hoc tests for individual parameters with a Sidak-adjusted p-value threshold for the number of parameters tested. If the post hoc tests reject the individual null, conclusions can be made on individual differential parameters.
When only one parameter is of interest (e.g., differential phase), the corresponding individual test using the diffCircadian package will suffice. Note, in Ding et al. (2021), it is shown that the permutation test, Circacompare, and the LR tests are equally powerful with well controlled type I error. So the choice of using diffCircadian here is by convenience without further validations. In section 3.3, we will benchmark the performance of the two-stage tests with type I error control and power analysis with simulation data.

Numerical evaluation and justification of method choices in the pipeline
Simulations in this section provide justifications for recommended methods of each analytical step: (1) Sidak.FS for categorizing TOR; (2) likelihood ratio test for detecting DR fitness biomarkers (∆R 2 ); (3) the two-step differential rhythm parameter test for detecting multiple differential paramters.

Evaluating methods for classifying types of rhythmicity categories
For evaluations in this subsection, we will borrow the benchmarks of type I error from selective sequential model selection (Fithian et al. (2015)). The aim is to select the simplest model that best describes the data's sampling distribution. Thus, a type I error occurs when we reject the lower model and accept an upper model while the lower model is adequate. In the context of our classification question, such type I error occurs when: (1) an Arrhy biomarker (k = 0) is classified to be any other types (k = 1 ork = 2), denoted as TypeI 0 = P (k > 0|k = 0); (2) a RhyI or a RhyII biomarker (k = 1) is classified to RhyBoth (k = 2), denoted as Note that, a RhyI biomarker could also be incorrectly classified to RhyII or Arrhy, where a wrong model rather than a correct model with redundant variables is selected. Thus it is more meaningful to discuss a conditional type I error given a selection of the correct model: We then define the power as the probability of classifying biomarkers to their true category: Power 2 = P (k = 2|k = 2).

Simulation setting
We simulate biomarker expression data from the cosinor model: Rhythmic biomarkers are generated with SNR = A/σ > 0, while arrhythmic biomarkers are generated with A = 0, σ = 1. The method RAIN requires equal spaced time points. To compare all the methods with the same simulated data, we sampled data at time points 0, 4, 8, 12, 16, 20 (unit is hour), each repeated 5 times, which makes 30 samples in total for each group. Since M is independent from the rhythmicity signal, we draw M from UNIF(5, 10) for both groups. We simulated 10,000 Arrhy, RhyI, and RhyBoth biomarkers (RhyII is equivalent to RhyI, thus omitted) and compared the performance of different methods based on TypeI 0 , cTypeI 1 , Power 1 and Power 2 . All the settings are then repeated 10 times for variance estimation. For RhyI biomarkers we further studied the impact of SNR: A 1 /σ 1 = 0.5, 0.6, 0.7, 0.8, 0.9, 1 with fixed σ 1 = 1 and ϕ 1 = 0. This range covers the SNR we observe from real human data. When SNR > 1, all the methods show high power (greater than 0.8) with well-controlled TypeI 0 and cTypeI 1 thus is not shown. For RhyBoth biomarkers we simulated the following variations to study the impact of different SNR, shifted phase and unbalanced SNR between groups: • Same SNR: Both Groups have the same SNR with A 1 /σ 1 = A 2 /σ 2 = 0.5, 0.6, 0.7, 0.8, 0.9, 1 with fixed σ 1 = σ 2 = 1 and ϕ 1 = ϕ 2 = 0.
The poor performance of VDA in controlling TypeI 0 is expected due to its lack of concern for multiplicity, and its empirical type I error is consistent with the theoretical deviation (1 − (1 − 0.95) 2 = 0.0975). RAIN.DODR is even more anti-conservative, which may be due to the excessive sensitivity of RAIN to identify rhythmic biomarkers with a non-parametric procedure. The AIC-based model selection is the most anti-conservative, which agrees with the long-observed fact that AIC tends to favor more complex models given the small penalty it gives to the number of parameters selected (Shen and Ye (2002), Aho et al. (2014)). In our case, this property translates into selecting more groups to be rhythmic. AW-Fisher is theoretically guaranteed to detect overall signal (i.e., if the biomarker is rhythmic in any groups), but has a poor finite sample performance in the weight estimation (i.e., which group is rhythmic), which is consistent with its perfect performance in controlling TypeI 0 but not cTypeI 1 . Similarly, BIC's advantage of selecting the correct model is also not guaranteed with a finite sample (Aho et al. (2014)

Power analysis
When the biomarkers are rhythmic in only one group, Sidak.BS and Sidak.FS perform similarly with Power 1 increasing with SNR ( Figure 4A). When biomarkers are rhythmic in both groups with the same SNR or different SNR, Sidak.FS is always higher in Power 2 than Sidak.BS ( Figure 4B). Difference in phase does not impact Power 2 ( Figure 4C).

Genome-wide false discovery rate of Sidak.FS
To control the genome-wide Type I error, the p-values from fitting M D I 1 and M D II 1 (Equation 2.2 and 2.2) for all the biomarkers are adjusted separately using the Benjamini-Hochberg Procedure (Benjamini and Hochberg (1995)) and then input to the selective sequential model selection procedure introduced in section 2.2.
We then extend the definition of false discovery rate (FDR) and expected discovery rate (EDR) to mirror those of type I error and power defined at the start of section 3.1: ]. FDR of classifying Arrhy biomarkers to be rhythmic in any groups.
The proposed method controls FDR very well at the nominal level 0.05 for both FDR 0 and FDR 1 ( Figure 5A) and achieves high genome-wide power (EDR) for all types of rhythmic biomarkers ( Figure 5B).

Evaluating methods for differential rhythm fitness tests
In section 2.3 we proved that testing R 2 between two groups is approximately identical to testing SNR = A/σ under mild conditions. As a result, we simulate the two groups with the same SNR to evaluate the type I error control of the compared methods and then different SNR for power analysis.
For power analysis we fix SNR 1 = 3 while changing SNR 2 = 0.4, 0.6, 0.8, 1.2, 1.5, 2, 2.5, 3, 3.5, 4, where we B. EDR of Sidak.FS Figure 5: Genome-wide FDR and EDR: the number on the bars are mean (SD) from the simulation. fix A 1 = A 2 = 3 for both groups and only change σ 2 . Same variations with M and ϕ as above are simulated. All the settings for type I error control and power analysis are simulated with 10,000 biomarkers and 20 samples with time variables drawn from U N IF (0, 24), and the procedure was repeated 10 times.

Type I error control
When the two groups have the same A, ϕ, and M as well as SNR, the type I error control varies with the shared SNR value for all three methods ( Figure 6A). The type I error rate of the bootstrap test fluctuates wildly and is higher than the nominal level in most cases. The type I error rate increases slowly with SNR but is generally well controlled with the permutation test. With LR, the type I error is conservative with low SNR but is steady around the nominal level when SNR > 1. Also, the method LR has stable standard deviation (SD) for type I error rate, while both Bootstrap and Permutation methods have increasing SD with SNR. Furthermore, when differential parameters exist, the type I error control of the permutation test varies drastically, while it is unaffected for LR and the bootstrap test (Figure 7). In summary, only the LR test controls type I error well under the nominal level with varying SNR values or presence of differential parameters and thus, it is adopted as the default method in DiffCircaPipeline.

Power analysis
Use SNR 1 = 3 as the baseline (the blue dotted line in Figure 6B), statistical power of the three tests increase when the effect size (∆R 2 ) increases, while LR constantly shows the largest power.

Evaluating the two-stage differential parameter test
To benchmark, Type I error of the first-stage global test is defined as the probability of rejecting global null (Equation 3 or 4) given the null is true. Type I error of the second-stage post hoc test is the probability of falsely rejecting the corresponding individual parameter null after rejecting the global null.

Simulation setting
Only RhyBoth biomarkers are simulated to evaluate the DR parameter tests. Time points are generated from UNIF(0, 24). We simulated the same data sets for testing H 0,(A,ϕ) and H 0,(A,ϕ,M ) with only the testing procedures changed. In each simulation, 10,000 biomarkers under null or under alternative are simulated and are repeated 10 times for estimation of variance. In all the simulations we fix the noise level σ = 1.

Type I error control
When the data are simulated under the null, the value of parameters does not affect the type I error, so only results at A = 3, ϕ = 0 and M = 5 are shown. The first-stage global test controls the type I error rate perfectly at the nominal level in both testing scenarios (orange bars in Figure 8A and B). In the second-stage post hoc test, the union of the two or three post hoc tests is only slightly conservative (blue bars in Figure  8A and B). In contrast, if the parameter tests are performed separately without multiplicity correction, the test is very anti-conservative with inflated type I error to 10-15% (blue bars in Figure 8A and B).

Case studies
In this section, we illustrate the workflow of DiffCircaPipeline with three case studies to demonstrate the functionality and wide-applicability of the software. The first study compares the transcriptomic rhythmicity between two brain regions, nucleus accumbens (NAc) and caudate, focusing on healthy human subjects. In the second example, we study the differential rhythmicity between subjects with schizophrenia and unaffected comparison subjects in dorsal lateral prefrontal cortex (dlPFC) region in brain. Finally, we show that the DiffCircadianPipeline is applicable to general omics data by applying to a DNA methylation data set. Ketchesin et al. (2021) pairwisely compared the transcriptomic rhythmicity between three regions (NAc, caudate, and putamen). Here, we apply DiffCircaPipeline for DR between NAc and caudate only. The RNA-Seq data is quantified from the same cohort of 59 psychiatric-disorder-free subjects. The subjects' recorded time of death (TOD) was normalized to the zeitgeber time (ZT) scale and used as the expression time for biomarkers. The data is publicly available at Gene Expression Omnibus (GEO) with accession number GSE160521 and more details about the data could be found in Ketchesin et al. (2021).

Identification of types of rhythmicity
We used Sidak.FS to identify the TOR for the two brain regions and identified 546 RhyBoth genes, 1,487 RhyI genes and 1,638 RhyII genes at p-value cutoff 0.05 ( Figure 9A). At FDR=30% cutoff, we identify 396 RhyBoth genes, 1,174 RhyI genes and 1,015 RhyII genes. With pathway enrichment analysis tool Metascape Circos plots showing phase difference between the two brain regions. Each dot on the plot represents one RhyBoth gene. The angular axis is the gene's phase in NAc, and the radius is the phase difference comparing caudate to NAc. The radius axis is arranged such that the more interesting genes with peak differences greater than 4 hours are displayed at the outer circle for better visualization. The light-green-shaded area bounds the genes with a peak difference smaller than 4 hours. The non-grey genes have p-values smaller than 0.05 from the differential phase test (without multiple comparison correction for exploratory purpose). The differential phase genes of NAc and caudate are separated into three clusters by color: blue, yellow and green. E. Top pathways enriched for three clusters of differential phase genes. (Zhou et al. (2019)), we performed pathway enrichment tests for all genes rhythmic in NAc (RhyI and RhyBoth), all genes rhythmic in caudate (RhyII and RhyBoth), and the RhyBoth genes. Within the 546 RhyBoth genes, we find enrichment in circadian clock and cell cycle regulation pathways ( Figure 9B). The top predicted upstream regulators include the core clock genes like ARNTL, CLOCK, and NPAS2 ( Figure  9C).

Genes with phase shift between two brain regions
We are interested in the genes that reach the peak expression at a different time in different brain regions. In Figure 9D, the peak time of RhyBoth genes between NAc and caudate are shown and the differential phase genes form three clusters, e.g., the green cluster peaks between ZT 10 to ZT 14 in NAc, and its peak time in caudate is around 8 hours earlier. Similar phase shifts also exist in the yellow cluster and the blue cluster. Pathway enrichment results in Figure 9E reveal that cluster 2 is involved in cell division-related processes like DNA replication, histone binding, G2/M checkpoint, chromatin organization, and centrosome cycle. Cluster 3 is enriched in pathways related to ribosome components, apoptosis, and membrane-associated processes. These findings imply that these cellular activities occur in NAc and caudate at different times, which may be associated with a functional difference between the two brain regions.  Figure 10: Comparing rhythms between differential R 2 and differential σ 2 genes. A. (Top) Scatter plots of A, σ 2 , R 2 , and transformed rhythmicity p-values of genes with differential R 2 between NAc and caudate with a cutoff of p-value < 0.05. The gray dashed line is the diagonal reference line. (Bottom) Scatter plots of gene expression (y-axis) and zeitgeber time (x-axis) of the top 4 detected genes in NAc and caudate with ∆R 2 > 0 and ∆R 2 < 0 respectively. B. Same plots as in A using differential σ 2 test. C. Counterexamples for difference between RhyI/RhyII and ∆R 2 genes.
4.1.3 Comparing differential R 2 and differential σ 2 results Current methods that allow for differential rhythm fitness test for difference in σ 2 (DODR and diffCircadian). However, in real application we found that comparing R 2 is more meaningful than comparing σ 2 . Here we justify the decision by comparing the differential R 2 genes and the differential σ 2 genes between NAc and caudate, both with a p-value cutoff of 0.05. As in Figure 10A, an increase in R 2 is almost always associated with larger A and a smaller rhythmicity p-value. Likewise, a decrease in R 2 is almost always associated with smaller A and a larger rhythmicity p-value. Thus we conclude that genes have larger R 2 is more rhythmic.
In contrast, a change in σ 2 is not a strong sign of the change of A and rhythmicity p-value. As a result, we test R 2 difference for a strong indicator of change in A and rhythmiciy p-value. The concept of differential rhythm fitness may look similar to detecting rhythmicity only in one group but not in the other with the SSMS procedure (RhyI/RhyII), but they are for different purposes. We show two counterexamples for the illustration. First, The SSMS procedure is dependent on a dichotomous cutoff, as a result, RhyI/RhyII genes may not have a large difference in rhythm fitness ( Figure 10C Example 1). Furthermore, in Figure 10C Example 2, a RhyBoth gene is weakly rhythmic in caudate with a p-value of 0.01 and strongly rhythmic in NAc, which results in a sigficant ∆R 2 (p=0.023).

Case study two: rhythm comparison between subjects with schizophrenia and unaffected comparison subjects with RNA-Seq data
The CommonMind Consortium (Fromer et al. (2016)) collected RNA-Seq data from dorsolateral prefrontal cortex (dlPFC) of subjects with schizophrenia (SCZ) and unaffected comparison subjects (UC). After preprocessing, we applied the pipeline to 46 pairs of subjects. Each pair consists of one subject with SCZ and one UC subject, matched by sex and age (Seney et al. (2019)).

Identification of types of rhythmicity
With Sidak.FS and a p-value cutoff of 0.05, we identified 268 RhyI genes, 252 RhyII genes, and 12 RhyBoth genes. INGENUITY pathway analysis (IPA) software (Qiagen) was used for pathway enrichment analysis. In UC subjects, the most enriched pathway of the rhythmic genes is the circadian rhythm signaling (p=0.0085), and the top predicted upstream regulators are core clock genes PER1, Arntl-Clock complex, CRY2, PER2, CRY1, NPAS2. While in subjects with SCZ, the most enriched pathways are coronavirus replication pathway (p=0.0023), oxidative phosphorylation (p=0.0083), and cholesterol metabolism pathways. It is intriguing to observe a different set of circadian genes in subjects with SCZ and the common circadian genes with very few RhyBoth genes. As a result, DR parameter test is not applicable in this dataset and only DR fitness test is performed in the next subsection.

Genes with differential rhythm fitness
Next, we performed a differential rhythm fitness test for the 532 genes rhythmic in more than one group.
With the likelihood-ratio test and a p-value cutoff of 0.1, we identified 97 genes with a significant change of R 2 , where 60 increased R 2 fitness from UC to SCZ and 37 decreased. Calcium signaling is the most enriched pathway for genes with increased R 2 fitness (p=0.0031), which is supported by many studies showing association of elevated activity of Ca 2+ signaling in subjects with SCZ (Berridge (2013)). The pathway most enriched for genes with decreased R 2 is LPS/IL-1 Mediated Inhibition of RXR Function (p = 0.0027). This observation is corroborated in Vitale et al. (2017), where the same pathway is enriched in the differentially methylated loci between subjects with SCZ and UC subjects in three types of cells (induced pluripotent stem cells, olfactory neurosphere-derived cells, and fibroblasts). Scatter plots of representative genes in LPS/IL-1 (loss of rhythmicity) and calcium signaling (gain of rhythmicity) pathways are shown in Figure 11B.
4.3 Case study three: methylation differential rhythmicity analysis between Snord116 +/− and wild type mice Circadian rhythm exists in many levels of biological activities other than transcriptome. This section applies the pipeline to DNA methylation data to show its generalizability to other omics data. In this study, Snord116-deleted (Snord116 +/− ) mice are contrasted against wild type (WT) mice to study the rhythm change due to the imprinted disorder Prader-Willi syndrome (PWS). For each genotype, cerebral cortex samples are collected at six time points across day, three at each time point (n 1 = n 2 = 18). The data is publically available at GEO with the accession number GSE103249.
To increase accuracy and interpretability, We filtered out the methylation measurements at sites where the average sequencing coverage is less than five and aggregated methylation sites into gene regions using the R package methylSig (Park et al. (2014)). Gene annotations were downloaded with UCSC Table Browser (Karolchik et al. (2004)) at http://genome.ucsc.edu. The M values were calculate to fit the cosinor model.

Types of rhythmicity of methylation level by gene
Out of 12,815 methylation regions identified, 271 are RhyI (wild type only) genes, 276 are RhyII (Snord116 +/− only) genes, and 15 are RhyBoth under p-value=0.05 cutoff ( Figure 12A (left)). We then performed pathway enrichment for genes with rhythmic methylation in WT (RhyI and RhyBoth) and Snord116 +/− (RhyII and RhyBoth) separately using Metascape( Figure 12B). The precise role of the genes with rhythmic methylation patterns in the brain related to PWS is unknown. However, enriched pathways related to female reproduction, including gonad development (p = 0.0036) and oocyte development (p=0.0011) may be associated with hypogonadism, which is commonly found in individuals with PWS.

Differential rhythmicity of methylation level by gene
Since we summarize the methylation level of gene regions, the rhythmicity amplitude does not change much with the transformed M values. As a result, we performed only differential phase test to the 15 RhyBoth genes and found 4 with significant shifted phase (p < 0.05): Gm16085, Gm44737, Gm9750, and Pglyrp3. With DR fitness test, we found 66 genes with significantly changed R 2 in methylation rhythmicity (p < 0.05): 33 with increased R 2 fitness in Snord116 +/− while 33 have decreased R 2 . Figure 12E shows one of the top genes with differential R 2 between the two genotypes (p-value = 0.012), where Utp14b is also associated with male infertility.

Remark on weak signal
The rhythmicity strength of the methylation data is relatively weak.The weak signal could be due to the small sample size (n = 18 per group), or the pooling of sequencing read counts of methylation loci in gene regions. However, the TOR categorization procedure only serves as a filtering step to select genes for biologically meaningful DR tests, where the type I error will be further controlled. In fact, we identified 242 significant genes at q-value< 0.5 ( Figure 12C). The left-skewed distribution of DRF p-values also demonstrates the DRF signals ( Figure 12D).

Case study four: detecting chronic shift-lag induced rhythm changes in rats using real-time RT-PCR measures
To further demonstrate that the pipeline also applies to low-dimensional data, we applied it to real-time RT-PCR data where mRNA from only five genes was reverse transcribed and quantified. In this study, rats  are randomly assigned to two light regimes (n = 36 per lighting regimen): 1) a 12:12 light-dark (LD) cycle, 2) a chronic shift-lag paradigm of 6-hour phase advances occurring every 2 days for 10 repeats. The two groups are then put into DD (completely dark) for 5-7 days until sacrificed at circadian time (CT) 3, 7, 11, 15, 19, and 23 (n=6 per time point). Note that in this study, CT is used instead of ZT, where CT12 is defined as the onset of running-wheel activity. More details of the study design can be found in Logan et al. (2012). The five quantified genes include two canonical clock genes, Per2 and Bmal1, and three cytokine-related genes IFNγ, perforin and granzyme B. A log2 transformation was performed to the raw measurements to symmetrize the data before the DiffCircaPipeline analyses. A summary of the analyses results and scatter plots for each gene are shown in Figure 13.

Canonical clock genes
The two canonical clock genes are RhyBoth under q<0.05. We further performed the DRF test and DRP (∆A, ∆ϕ, and ∆M ) test. Bmal1 has a significantly advanced phase of 8.45 hours (q=3.5e−03) in the shift-lag group. Per2 is significantly changed in all three parameters with an amplitude increase of 0.64 (q=3.5e−05), phase lag of 10.82 hours (q=1.3e−03), and MESOR increase of 0.7 (q=3.2e−08). The rhythm fitness R 2 is also increased by 0.35 by a marginal q-value.
Note that further normalization (e.g. scaling the measurements by maximum value) might eliminate the significant change of A and MESOR in Per2. Nonetheless, the distinguished phase difference suggests that rotating shift workers could suffer a significant disruption in circadian rhythm.

Cytokine-related genes
Among the three cytokine-related genes, only granzyme B is rhythmic in the control group, while IFNγ and perforin are not rhythmic in either condition. As a result, we performed test DRF test on granzyme B and found a decreased R 2 of 0.34 with a marginal q-value of 0.08.